1 Key Takeaways

  • Turkey’s incoming visitors with their nationalities data is analyzed.
  • Seasonality analysis and effects of terrorism, development level, visa policy, exchange rates, distance on number of tourists are examined.
  • Interesting results are found such as:
    • Visitors mostly prefer summer season and number of visitors has an increasing trend throughout years except 2016 the year terrorism data peaks with the coup attempt and 2020 the year COVID-19 epidemic spreads.
    • HDI, an index indicating development level of a country, analysis shows Turkey hosts tourists more from developed countries.
    • Political issues also have an impact on number of tourist. While number of Russian tourists has an increasing trend till 2015, it starts to decrease suddenly in 2015. This is most probably because of retrograde relationship of Russia and Turkey after the Russian plane crash made by Turkey. Then, the number increases again with the normalization process at the end of 2016.

2 Downloading Data

This is the exploratory data analysis of Turkey’s incoming tourist’s nationalities between 2008 and 2020. The data is downloaded from TCMB’s website and translated to English.

raw_df <- read_excel("milliyetlere_gore_ziyaretci_sayisi.xlsx")
raw_df %>% glimpse()
## Rows: 150
## Columns: 103
## $ Date                                  <chr> "2008-01", "2008-02", "2008-0...
## $ Germany                               <dbl> 177233, 143666, 249797, 24253...
## $ Albania                               <dbl> 2811, 2604, 3626, 3219, 4156,...
## $ Austria                               <dbl> 20207, 16295, 23558, 22668, 3...
## $ Belgium                               <dbl> 12389, 11309, 21097, 30772, 5...
## $ `Bosnia and Herzegovina`              <dbl> 2546, 2342, 2952, 3539, 4709,...
## $ Bulgaria                              <dbl> 99048, 82707, 102877, 110627,...
## $ `Czech Republic`                      <dbl> 1824, 2064, 2524, 4198, 9286,...
## $ Denmark                               <dbl> 5613, 5464, 11288, 10878, 260...
## $ Estonia                               <dbl> 426, 515, 695, 1098, 4508, 46...
## $ Finland                               <dbl> 1921, 1723, 4142, 7661, 12479...
## $ France                                <dbl> 30796, 29228, 36832, 73221, 8...
## $ Cyprus                                <dbl> 296, 369, 435, 527, 888, 898,...
## $ Croatia                               <dbl> 2354, 1738, 2649, 2384, 2997,...
## $ Netherlands                           <dbl> 32012, 23565, 34660, 47950, 1...
## $ `United Kingdom`                      <dbl> 35215, 33385, 46445, 85841, 2...
## $ Ireland                               <dbl> 2086, 1723, 3593, 3846, 11936...
## $ Spain                                 <dbl> 7724, 8331, 27664, 20599, 319...
## $ Sweden                                <dbl> 8156, 5808, 11248, 14843, 474...
## $ Switzerland                           <dbl> 9655, 8947, 13368, 14681, 224...
## $ Italy                                 <dbl> 24918, 15204, 25805, 34936, 5...
## $ Iceland                               <dbl> 72, 99, 367, 622, 463, 2065, ...
## $ Montenegro                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Kosovo                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Latvia                                <dbl> 1126, 1309, 2050, 2251, 6950,...
## $ Lithuania                             <dbl> 1076, 1198, 2262, 3400, 14423...
## $ Luxembourg                            <dbl> 164, 155, 341, 394, 2813, 606...
## $ Hungary                               <dbl> 2266, 2281, 3085, 3974, 5364,...
## $ Macedonia                             <dbl> 5480, 5253, 5930, 7985, 8042,...
## $ Malta                                 <dbl> 114, 110, 280, 259, 242, 262,...
## $ Norway                                <dbl> 3488, 3203, 8424, 9692, 22667...
## $ Poland                                <dbl> 4543, 5009, 5100, 9547, 34855...
## $ Portugal                              <dbl> 810, 999, 1849, 2006, 3393, 4...
## $ Romania                               <dbl> 16966, 21941, 21451, 26194, 3...
## $ Serbia                                <dbl> 8449, 7248, 8441, 9357, 11950...
## $ `Slovak Republic`                     <dbl> 883, 1379, 1408, 1957, 2815, ...
## $ Slovenia                              <dbl> 1279, 1378, 1322, 1910, 3649,...
## $ Greece                                <dbl> 27656, 21990, 44384, 46099, 5...
## $ `Other Europe Countries`              <dbl> 91, 60, 145, 133, 170, 230, 2...
## $ `Total Europe`                        <dbl> 551693, 470599, 732094, 86179...
## $ Azerbaijan                            <dbl> 29978, 33028, 37111, 35267, 3...
## $ Belarus                               <dbl> 1764, 2999, 3040, 4130, 14623...
## $ Armenia                               <dbl> 2500, 3586, 3731, 4011, 4756,...
## $ Georgia                               <dbl> 40517, 43820, 49739, 57554, 7...
## $ Kazakhstan                            <dbl> 6059, 8777, 8070, 8178, 12683...
## $ Kyrgyzstan                            <dbl> 3859, 3398, 3597, 3474, 3678,...
## $ Moldova                               <dbl> 7769, 9322, 10433, 11654, 150...
## $ Uzbekistan                            <dbl> 3431, 4520, 4133, 3991, 4645,...
## $ Russia                                <dbl> 52741, 46999, 52034, 64565, 2...
## $ Tajikistan                            <dbl> 2965, 4837, 6169, 4361, 3873,...
## $ Turkmenistan                          <dbl> 5221, 5407, 6562, 6825, 6740,...
## $ Ukraine                               <dbl> 20789, 21474, 23654, 29568, 9...
## $ `Total UIS`                           <dbl> 177593, 188167, 208273, 23357...
## $ `United States`                       <dbl> 22581, 18562, 29479, 38922, 8...
## $ Argentina                             <dbl> 662, 612, 789, 916, 2358, 236...
## $ Brazil                                <dbl> 1442, 1446, 1318, 2387, 5157,...
## $ Canada                                <dbl> 3670, 2981, 5090, 7395, 18578...
## $ Colombia                              <dbl> 277, 107, 238, 283, 488, 656,...
## $ Mexico                                <dbl> 720, 546, 1430, 1501, 2287, 2...
## $ Chile                                 <dbl> 185, 554, 248, 527, 1094, 101...
## $ Venezuela                             <dbl> 494, 125, 412, 469, 744, 789,...
## $ `Other America Countries`             <dbl> 1590, 1895, 1857, 2235, 3368,...
## $ `Total America`                       <dbl> 31621, 26828, 40861, 54635, 1...
## $ Algeria                               <dbl> 3719, 2825, 3936, 4898, 5284,...
## $ Morocco                               <dbl> 2088, 2094, 2338, 2774, 3274,...
## $ `South Africa`                        <dbl> 586, 578, 1077, 1267, 2244, 2...
## $ Libya                                 <dbl> 2526, 1871, 2672, 3172, 3379,...
## $ Egypt                                 <dbl> 2320, 3201, 3412, 3441, 4073,...
## $ Sudan                                 <dbl> 329, 220, 336, 330, 562, 931,...
## $ Tunisia                               <dbl> 2541, 2216, 3684, 2877, 4135,...
## $ `East Africa`                         <dbl> 1164, 1600, 1709, 1826, 2807,...
## $ `Total Africa`                        <dbl> 15273, 14605, 19164, 20585, 2...
## $ `United Arab Emirates`                <dbl> 568, 252, 605, 405, 617, 1458...
## $ Bahrain                               <dbl> 194, 166, 247, 232, 323, 520,...
## $ Bangladesh                            <dbl> 226, 357, 245, 224, 267, 356,...
## $ China                                 <dbl> 3848, 4972, 4014, 4586, 6100,...
## $ Indonesia                             <dbl> 545, 486, 910, 1149, 1500, 19...
## $ Philippines                           <dbl> 1248, 2018, 2040, 1971, 2456,...
## $ `South Korea`                         <dbl> 13933, 10652, 8717, 12782, 12...
## $ India                                 <dbl> 2825, 2688, 4180, 4201, 6783,...
## $ Iraq                                  <dbl> 8771, 10418, 13165, 12539, 13...
## $ Iran                                  <dbl> 29120, 36619, 81837, 63238, 8...
## $ Israel                                <dbl> 16500, 19408, 29858, 50992, 4...
## $ Japan                                 <dbl> 10985, 11197, 12858, 11516, 1...
## $ `Turkish Republic of Northern Cyprus` <dbl> 10696, 15090, 11621, 14283, 1...
## $ Qatar                                 <dbl> 91, 105, 113, 158, 154, 272, ...
## $ Kuwait                                <dbl> 275, 478, 256, 896, 884, 1707...
## $ Lebanon                               <dbl> 1836, 1719, 2842, 3939, 2565,...
## $ Malaysia                              <dbl> 1153, 1679, 2424, 1840, 2614,...
## $ Pakistan                              <dbl> 1858, 1685, 1800, 1746, 2659,...
## $ Singapore                             <dbl> 736, 909, 1117, 1302, 1693, 2...
## $ Syria                                 <dbl> 24464, 24949, 27203, 27704, 3...
## $ `Saudi Arabia`                        <dbl> 721, 1499, 1016, 2074, 2025, ...
## $ Thailand                              <dbl> 563, 516, 1003, 1081, 1005, 7...
## $ Jordan                                <dbl> 2787, 2594, 3280, 3455, 4336,...
## $ Yemen                                 <dbl> 116, 299, 287, 331, 444, 426,...
## $ `Other Asia Countries`                <dbl> 2921, 3191, 4073, 4547, 7854,...
## $ `Total Asia`                          <dbl> 136980, 153946, 215711, 22719...
## $ Australia                             <dbl> 4823, 3254, 3571, 10084, 1301...
## $ `New Zealand`                         <dbl> 497, 394, 592, 2249, 2606, 26...
## $ Oceania                               <dbl> 3, 11, 69, 70, 11, 11, 17, 36...
## $ Haymatlos                             <dbl> 1056, 947, 1233, 1304, 1481, ...
## $ `Grand Total`                         <dbl> 919539, 858751, 1221568, 1411...

As it can be seen, the data frame includes 150 observations of 103 groups, 1 of them being the Date (in the form of Year-Month), 96 of them being individual countries and 6 of them being groups of countries.

3 Preprocessing Data

First, we need to convert the date column into dttm format instead of chr.

#Parsing the Date column and changing them to required format
raw_df$Date <- parse_date_time(raw_df$Date, "ym")
raw_df %>% select(Date) %>% glimpse()
## Rows: 150
## Columns: 1
## $ Date <dttm> 2008-01-01, 2008-02-01, 2008-03-01, 2008-04-01, 2008-05-01, 2...
#Let's form another data frame not including the totals
df <- raw_df %>% select(-contains("Total")) 
#These are the unique country names
countries <- df %>% select(-Date) %>% colnames()

4 Data Analysis

4.1 Total visitors from countries by year and month

Let’s analyze which countries visited Turkey in this period the most and the least, both in terms of total tourists and average tourists per month and the number of visitors by those countries. Below are the top 10 countries that visited Turkey the most and the least:

totalsums <- df %>% select(-Date) %>% summarise(across(everything(),sum)) %>% sort(decreasing = TRUE) 
top10_s <- t(totalsums %>% select(1:10))
last10_s <- t(totalsums %>% select(87:96))
top10_s
##                    [,1]
## Germany        56480704
## Russia         46934888
## United Kingdom 28369408
## Bulgaria       21681788
## Iran           20221227
## Georgia        19886761
## Netherlands    13736849
## France         10688681
## Ukraine        10602414
## Greece          8325941
last10_s
##                          [,1]
## Cyprus                 164311
## Chile                  161156
## Luxembourg             132672
## Sudan                  122162
## Bangladesh             112580
## Venezuela              102757
## Malta                   76191
## Iceland                 73569
## Other Europe Countries  31809
## Oceania                 11428
totalmeans <- df %>% select(-Date) %>% summarise(across(everything(),mean)) %>% sort(decreasing = TRUE) 
top10_m <- t(totalmeans %>% select(1:10))
last10_m <- t(totalmeans %>% select(87:96))
top10_m
##                     [,1]
## Germany        376538.03
## Russia         312899.25
## United Kingdom 189129.39
## Bulgaria       144545.25
## Iran           134808.18
## Georgia        132578.41
## Netherlands     91578.99
## France          71257.87
## Ukraine         70682.76
## Greece          55506.27
last10_m
##                              [,1]
## Cyprus                 1095.40667
## Chile                  1074.37333
## Luxembourg              884.48000
## Sudan                   814.41333
## Bangladesh              750.53333
## Venezuela               685.04667
## Malta                   507.94000
## Iceland                 490.46000
## Other Europe Countries  212.06000
## Oceania                  76.18667

Interestingly, both lists have the same countries in terms of means and averages, so we can put them in the same data frame.

top10 <- bind_cols(means = top10_m, sums= top10_s, names= rownames(top10_m)) 
last10 <- bind_cols(means = last10_m, sums= last10_s, names= rownames(last10_m)) 
top10  %>% ggplot(aes(x=names, y=means, fill=names)) + geom_col() +theme_minimal() + theme(legend.position = "None") +labs(x="Country",y="Average Number of Tourists",title="Countries from which Tourists Come the Most")

last10 %>% ggplot(aes(x=names, y=means,fill=names)) + geom_col() +theme_minimal() + 
  theme(legend.position = "None", axis.text.x = element_text(angle = 45)) +
  labs(x="Country",y="Average Number of Tourists",title="Countries from which Tourists Come the Least")

We can also check which country visited Turkey the most each year & month:

#Creating a data frame with yearly aggregates
year_df <- df %>% mutate(Year= year(Date)) %>% select (-Date) %>% relocate(Year) %>% group_by(Year) %>% summarise_all(list(sum))
#Finding the name of the country which is maximum of each year
year_df$max = names(year_df)[apply(year_df, 1, which.max)]
year_df %>% select(Year,max)
## # A tibble: 13 x 2
##     Year max    
##    <dbl> <chr>  
##  1  2008 Germany
##  2  2009 Germany
##  3  2010 Germany
##  4  2011 Germany
##  5  2012 Germany
##  6  2013 Germany
##  7  2014 Germany
##  8  2015 Germany
##  9  2016 Germany
## 10  2017 Russia 
## 11  2018 Russia 
## 12  2019 Russia 
## 13  2020 Germany
month_df <- df %>% mutate(Month= month(Date)) %>% select (-Date) %>% relocate(Month) %>% group_by(Month) %>% summarise_all(list(sum))
#Finding the name of the country which is maximum of each year
month_df$max = names(month_df)[apply(month_df, 1, which.max)]
month_df %>% select(Month,max)
## # A tibble: 12 x 2
##    Month max    
##    <dbl> <chr>  
##  1     1 Germany
##  2     2 Germany
##  3     3 Germany
##  4     4 Germany
##  5     5 Russia 
##  6     6 Russia 
##  7     7 Russia 
##  8     8 Germany
##  9     9 Germany
## 10    10 Germany
## 11    11 Germany
## 12    12 Germany

Let’s see which continents the tourists came from the most.

total_df <- raw_df %>% select("Date" | contains("Total"))
total_df %>% pivot_longer(.,-Date) %>% ggplot(.,aes(x=Date,y=value,color=name)) + geom_line()+
    labs(x="Date", y="Number of Tourists",title="Number of Tourists by Continents",color="Continents")

total_df %>% mutate(Year = year(Date)) %>% select(-Date) %>% group_by(Year) %>% summarise_all(list(mean)) %>% 
  pivot_longer(.,-Year) %>% ggplot(.,aes(x=Year,y=value,color=name)) + geom_line()+
    labs(x="Date", y="Number of Tourists",title="Number of Tourists by Continents (Yearly)",color="Continents")

It’s worth checking the months in which the most tourists visited Turkey.

monthly_sum <- raw_df %>% mutate(month=month(Date)) %>% select(`Grand Total`,month,-Date) %>% relocate(month) %>% group_by(month) %>% summarise(sum=sum(`Grand Total`))
ggplot(monthly_sum) + geom_col(aes(x=month,y=sum)) + labs(title = "Visitors by Month", x="Month", y="Number of Visitors") + theme_minimal() 

#Add names of the months here

4.2 Seasonality Analysis

We can explore the effect of seasonality in the following way:

ts <- ts(total_df$`Grand Total`, frequency = 12, start = 2008) 
decomposed <- decompose(ts, type="additive")
plot(decomposed)

The decomposed plot shows us there is an increased trend, however in 2016, there is a significant drop in the number of tourists, which can be attributed to the coup attempt during that summer.

This is the original plot of the data:

total_df %>% ggplot(aes(x=Date,y=`Grand Total`)) +geom_line() + theme_minimal() +
  labs(title="Total Number of Tourists with Seasonality",y="Number of Tourists")

This is the deseasonalized version of the data:

ts.stl <- stl(ts,"periodic") 
ts.sa <- seasadj(ts.stl) 
total_df %>% ggplot(aes(x=Date, y= ts.sa)) +geom_line() + theme_minimal() +
  labs(title="Total Number of Tourists without Seasonality",y="Number of Tourists")

seasonplot(ts.sa, 12, col=rainbow(13), season.labels=TRUE, year.labels=TRUE, main="Seasonal Visitors") #Make these clearer?

We can see that the tourist data in 2020 is very different than the rest and downwards trending due to Covid-19.

4.3 Relation Between Number of Visitors and Terrorism in Turkey

Reading the terrorism data:

ter<-read.csv("globalterrorism_turkey.csv")

Preporocessing the terrorism data:

terrorism_turkey<-ter%>%filter(country_txt=="Turkey")%>%filter(iyear>=2008)
terrorism_turkey$natlty1_txt=NULL
colnames(terrorism_turkey)[6]  <- "City"
colnames(terrorism_turkey)[7]  <- "County"

Terrorism events are grouped by years and months in order to be examined.

number_of_events_year<-terrorism_turkey%>%group_by(iyear)%>%count()
number_of_events_month<-terrorism_turkey%>%group_by(imonth,iyear)%>%count()
colnames(number_of_events_year)[2]  <- "total"
colnames(number_of_events_year)[1]  <- "Year"
number_of_events_year$Year<- as.Date(as.character(number_of_events_year$Year), format = "%Y")

Touristic data is arranged.

data_tourist_year<-data_tourist%>%pivot_longer(cols=-Date,names_to="Country", values_to="Visits")%>%group_by(Country,year(Date))%>%summarise(Total_year=sum(Visits))
colnames(data_tourist_year)[2]  <- "Year"
colnames(data_tourist_year)[3]  <- "Visits"

Terrorism data is analyzed city by city.

terrorismBycity<-terrorism_turkey%>%group_by(City, iyear)%>%count()
colnames(terrorismBycity)[2]  <- "Year"
yearly_total_visits<-data_tourist_year%>%group_by(Year)%>%summarize(total=sum(Visits))
yearly_total_visits$Year<- as.Date(as.character(yearly_total_visits$Year), format = "%Y")
class(yearly_total_visits$Year)
## [1] "Date"

The plot with two scales represent the correlation between the number of tourists and terrorism statistics.

ggplot() + 
  geom_bar(mapping = aes(x = yearly_total_visits$Year, y = yearly_total_visits$total), stat = "identity", fill = "black") +
  geom_line(mapping = aes(x = number_of_events_year$Year, y = number_of_events_year$total*100000), size = 2, color = "blue") + 
  scale_x_date(name = "Years") +
  scale_y_continuous(name = "Number of Tourists", 
                     sec.axis = sec_axis(~./100000, name = "Terrorism", 
                                         labels = function(b) { paste0(b)})) + 
  theme(
    axis.title.y = element_text(color = "black"),
    axis.title.y.right = element_text(color = "blue"))+
  labs(title="Average Number of Tourists with the Terrorism Level (yearly)")

As the graph illustrates, it is hard to say that there is no correlation between the number of terrorisms in Turkey and the number of tourists in Turkey. With respect to the annual analysis of this correlation, an increase in the number of terrorism events results in a decrease in the number of touristic visits. Although the terrorism data frame only gives information until 2017, data visualization makes it clear to observe the negative correlation between these two subjects. For instance, there is undeniable fall in the number of tourists in 2016, probably due to the military coup on July 15. Following this incident, it takes some time to rebalance the number of visitors in the upcoming years. Consequently, a strong correlation between these two elements exist.

4.4 Relation Between Number of Visitors and Development Level of Their Countries

In this part of the analysis, we used multiple development indicators to analyze number of visitors with respect to the development levels of their countries.

Indicators used:
* GDP per capita (in current US dollars, gross domestic product divided by midyear population)
* Life expectancy
* Human Development Index (HDI is a measure that combines life expectancy, education and income indexes)

We used multiple indicators because there is no certain way to determine the development level of a country.

Loading required libraries:

pti <- c("tidyverse","readxl","zoo")
pti <- pti[!(pti %in% installed.packages())]
if(length(pti)>0){
    install.packages(pti)
}

library(tidyverse)
library(readxl)
library(zoo)

Importing main data:

raw_data <- read_xlsx("milliyetlere_gore_ziyaretci_sayisi.xlsx")

raw_data <- raw_data %>%
   mutate(Date=as.yearmon(Date))

Creating yearly sums of visitors:
We calculated yearly number of visitors in years 2008-2018 & yearly average number of visitors during that period. We mostly used this yearly average values in our analysis.

years_total <- raw_data %>%
  t()
colnames(years_total)=years_total[1,]
years_total <- years_total[-1,]

years_total <- as.data.frame(years_total)
years_total <- rownames_to_column(years_total, "Country Name")

years_total[,2:151] <- as.numeric(unlist(years_total[,2:151]))

years_total<- years_total %>%
  rowwise() %>%
  mutate( sum_2008=sum(c_across(`Jan 2008`:`Dec 2008`)),
          sum_2009=sum(c_across(`Jan 2009`:`Dec 2009`)),
          sum_2010=sum(c_across(`Jan 2010`:`Dec 2010`)),
          sum_2011=sum(c_across(`Jan 2011`:`Dec 2011`)),
          sum_2012=sum(c_across(`Jan 2012`:`Dec 2012`)),
          sum_2013=sum(c_across(`Jan 2013`:`Dec 2013`)),
          sum_2014=sum(c_across(`Jan 2014`:`Dec 2014`)),
          sum_2015=sum(c_across(`Jan 2015`:`Dec 2015`)),
          sum_2016=sum(c_across(`Jan 2016`:`Dec 2016`)),
          sum_2017=sum(c_across(`Jan 2017`:`Dec 2017`)),
          sum_2018=sum(c_across(`Jan 2018`:`Dec 2018`)),
          sum_2019=sum(c_across(`Jan 2019`:`Dec 2019`)),
          avg_2008to2018_yearly=mean(c_across(`Jan 2008`:`Dec 2018`))*12
  ) %>%
  select('Country Name', sum_2008:sum_2019, avg_2008to2018_yearly)

Importing GDP per capita data:
Our visitors data includes only the years 2008 to 2020, so we will need GDP per capita for only these years. Also we don’t have GDP per capita data for 2020, that’s why we took only from 2008 to 2019.

GDP_per_capita_data <- read_xls("GDP_per_capita.xls")
GDP_per_capita_data %>%
  head()
## # A tibble: 6 x 64
##   `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
##   <chr>          <chr>          <chr>            <chr>             <dbl>  <dbl>
## 1 Aruba          ABW            GDP per capita ~ NY.GDP.PCAP.CD     NA     NA  
## 2 Afghanistan    AFG            GDP per capita ~ NY.GDP.PCAP.CD     59.8   59.9
## 3 Angola         AGO            GDP per capita ~ NY.GDP.PCAP.CD     NA     NA  
## 4 Albania        ALB            GDP per capita ~ NY.GDP.PCAP.CD     NA     NA  
## 5 Andorra        AND            GDP per capita ~ NY.GDP.PCAP.CD     NA     NA  
## 6 Arab World     ARB            GDP per capita ~ NY.GDP.PCAP.CD     NA     NA  
## # ... with 58 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>,
## #   `1965` <dbl>, `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
## #   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
## #   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
## #   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
## #   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
## #   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>,
## #   `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, `1999` <dbl>,
## #   `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, `2004` <dbl>,
## #   `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>,
## #   `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>,
## #   `2015` <dbl>, `2016` <dbl>, `2017` <dbl>, `2018` <dbl>, `2019` <dbl>
GDP_per_capita_data <- GDP_per_capita_data %>%
  select(`Country Name`,`2008`:`2019`) 

GDP_per_capita_data <- as.data.frame(GDP_per_capita_data)

Joining the data:

colnames(GDP_per_capita_data)[2:13] <- paste0("GDP_per_capita_", colnames(GDP_per_capita_data)[2:13])

joined_data_GDP_per_capita <- years_total %>%
  inner_join(GDP_per_capita_data, by=c(`Country Name`= 'Country Name' )) %>%
  mutate(avg_GDPperCap_2008to2018=mean(c_across(GDP_per_capita_2008:GDP_per_capita_2018), na.rm = TRUE))

In order to simplify our analysis, we divided countries into 4 levels using average GDP per capita quantiles. Then we plotted average GDP per capita with respect to average yearly visitors, colored by GDP per capita levels. Also we took sum of yearly average visitors within the groups and plotted a pie chart, in order to see the breakdown of development level of countries of Turkey’s visitors.

q1 <- quantile(joined_data_GDP_per_capita$avg_GDPperCap_2008to2018, na.rm = TRUE)

joined_data_GDP_per_capita <- joined_data_GDP_per_capita %>%
  mutate(GDP_per_capita_level=case_when((avg_GDPperCap_2008to2018>=q1[1]&avg_GDPperCap_2008to2018<q1[2])~"very low",
                                        (avg_GDPperCap_2008to2018>=q1[2]&avg_GDPperCap_2008to2018<q1[3])~"low",
                                        (avg_GDPperCap_2008to2018>=q1[3]&avg_GDPperCap_2008to2018<q1[4])~"medium",
                                        (avg_GDPperCap_2008to2018>=q1[4]&avg_GDPperCap_2008to2018<=q1[5])~"high" 
                                        )
         )

joined_data_GDP_per_capita$GDP_per_capita_level <- factor(joined_data_GDP_per_capita$GDP_per_capita_level, levels = c("high", "medium","low","very low"))

joined_data_GDP_per_capita %>%
  ggplot(aes(x=avg_2008to2018_yearly,y=avg_GDPperCap_2008to2018, color=GDP_per_capita_level),fig.align='center')+
  geom_point() +
  labs(x="2008-2018 yearly number of visitors average", y="2008-2018 average GDP per capita",
       title="Average GDP per Capita") +
  scale_color_discrete(name="GDP per capita levels of countries")

GDP_per_capita_level_grouped <- joined_data_GDP_per_capita %>%
  group_by(GDP_per_capita_level) %>%
  summarise(sum=sum(avg_2008to2018_yearly))

ggplot(GDP_per_capita_level_grouped, aes(x="",y=sum,fill=GDP_per_capita_level))+geom_bar(stat="identity",width=1)+coord_polar("y")+labs(x="",y="Total yearly average visitors",title="GDP per Capita Level") +
  scale_fill_discrete(name="GDP per capita levels of countries")

Top 5 countries whose citizens visited Turkey the most belong either high or low GDP per capita levels. It is worth to mention that people whose countries belong to the medium category do not visit Turkey much, unlike high and low categories. We noticed that medium level countries are mostly countries from cental Europe, such as Slovenia, Poland, Czech Republic etc., or countries that are very far from Turkey, such as Venezuela and South Korea.

Importing life expectancy data of countries:

life_expectancy_data <- read_xls("Life_expectancy.xls")

life_expectancy_data %>%
  head()
## # A tibble: 6 x 64
##   `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
##   <chr>          <chr>          <chr>            <chr>             <dbl>  <dbl>
## 1 Aruba          ABW            Life expectancy~ SP.DYN.LE00.IN     65.7   66.1
## 2 Afghanistan    AFG            Life expectancy~ SP.DYN.LE00.IN     32.4   33.0
## 3 Angola         AGO            Life expectancy~ SP.DYN.LE00.IN     37.5   37.8
## 4 Albania        ALB            Life expectancy~ SP.DYN.LE00.IN     62.3   63.3
## 5 Andorra        AND            Life expectancy~ SP.DYN.LE00.IN     NA     NA  
## 6 Arab World     ARB            Life expectancy~ SP.DYN.LE00.IN     46.5   47.1
## # ... with 58 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>,
## #   `1965` <dbl>, `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
## #   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
## #   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
## #   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
## #   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
## #   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>,
## #   `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, `1999` <dbl>,
## #   `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, `2004` <dbl>,
## #   `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>,
## #   `2010` <dbl>, `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>,
## #   `2015` <dbl>, `2016` <dbl>, `2017` <dbl>, `2018` <dbl>, `2019` <lgl>
life_expectancy_data <- life_expectancy_data %>%
  select(`Country Name`,`2008`:`2018`)

life_expectancy_data <- as.data.frame(life_expectancy_data)

Joining the data:

colnames(life_expectancy_data)[2:12] <- paste0("life_expectancy_", colnames(life_expectancy_data)[2:12])

joined_data_life_expectancy <- years_total %>%
  inner_join(life_expectancy_data, by=c('Country Name'='Country Name')) %>%
  mutate(avg_life_expectancy_2008to2018=mean(c_across(life_expectancy_2008:life_expectancy_2018)))

We divided countries into 4 categories using average life expectancy quantiles. Then we plotted average life expectancy with respect to average yearly visitors, colored by life expectancy level of countries. We also took sum of yearly average visitors within the groups and plotted a pie chart,in order to see the breakdown of development level of visitor countries.

q2 <- quantile(joined_data_life_expectancy$avg_life_expectancy_2008to2018, na.rm = TRUE)

joined_data_life_expectancy <- joined_data_life_expectancy %>%
  mutate(life_expectancy_level=case_when((avg_life_expectancy_2008to2018>=q2[1]&avg_life_expectancy_2008to2018<q2[2])~"very low",
                                        (avg_life_expectancy_2008to2018>=q2[2]&avg_life_expectancy_2008to2018<q2[3])~"low",
                                        (avg_life_expectancy_2008to2018>=q2[3]&avg_life_expectancy_2008to2018<q2[4])~"medium",
                                        (avg_life_expectancy_2008to2018>=q2[4]&avg_life_expectancy_2008to2018<=q2[5])~"high" 
  )
  )

joined_data_life_expectancy$life_expectancy_level <- factor(joined_data_life_expectancy$life_expectancy_level, levels = c("high", "medium","low","very low"))

joined_data_life_expectancy %>%
  ggplot(aes(x=avg_2008to2018_yearly,y=avg_life_expectancy_2008to2018, color=life_expectancy_level)) +
  geom_point()+
  labs(x="2008-2018 yearly number of visitors average", y="2008-2018 average life expectancy",
       title="Average Life Expectancy") +
  scale_color_discrete(name="Life expectancy levels of countries")

life_expectancy_level_grouped <- joined_data_life_expectancy %>%
  group_by(life_expectancy_level) %>%
  summarise(sum=sum(avg_2008to2018_yearly))

ggplot(life_expectancy_level_grouped, aes(x="",y=sum,fill=life_expectancy_level))+geom_bar(stat="identity",width=1)+coord_polar("y")+labs(x="",y="Total yearly average visitors", title="Life Expectancy Level")+
  scale_fill_discrete(name="Life expectancy levels of countries")

Similarly we do not see many medium level countries in top countries, except Germany. Unlike the pie chart for GDP per capita, life expectancy pie chart is not far from a equally divided pie. We can conclude that visitors of Turkey quite equally come from countries with all levels of life expectancy.

Importing Human Development Index (HDI) data:

HDI_data <- read_xlsx("Human_development_index_HDI.xlsx")

HDI_data %>%
  head()
## # A tibble: 6 x 30
##   Country `1990` `1991` `1992` `1993` `1994` `1995` `1996` `1997` `1998` `1999`
##   <chr>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
## 1 Afghan~ 0.297~ 0.303~ 0.312  0.308  0.302~ 0.327~ 0.331~ 0.335~ 0.339~ 0.343~
## 2 Albania 0.644~ 0.625  0.607~ 0.610~ 0.616~ 0.629  0.639~ 0.639~ 0.649~ 0.66  
## 3 Algeria 0.577~ 0.581~ 0.588~ 0.592~ 0.596~ 0.601~ 0.61   0.618~ 0.629  0.638~
## 4 Andorra N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A   
## 5 Angola  N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    0.384~
## 6 Antigu~ N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A    N/A   
## # ... with 19 more variables: `2000` <chr>, `2001` <chr>, `2002` <chr>,
## #   `2003` <chr>, `2004` <chr>, `2005` <chr>, `2006` <chr>, `2007` <chr>,
## #   `2008` <chr>, `2009` <chr>, `2010` <chr>, `2011` <chr>, `2012` <chr>,
## #   `2013` <chr>, `2014` <chr>, `2015` <chr>, `2016` <chr>, `2017` <chr>,
## #   `2018` <chr>
colnames(HDI_data) <- gsub("X","",colnames(HDI_data))

HDI_data <- HDI_data %>%
  select('Country Name'= Country, '2008':'2018')

HDI_data <- as.data.frame(HDI_data)

Joining and plotting the data:
We classified countries by their HDI into 4 categories: “very high”, “high”, “medium” and “low” (source: https://en.wikipedia.org/wiki/Developed_country )

colnames(HDI_data)[2:12] <- paste0("HDI_", colnames(HDI_data)[2:12])

HDI_data[,2:12] <- as.numeric(unlist(HDI_data[,2:12]))

joined_data_HDI <- years_total %>%
  inner_join(HDI_data, by=c('Country Name'='Country Name')) %>%
  mutate(avg_HDI_2008to2018=mean(c_across(HDI_2008:HDI_2018),na.rm = TRUE)) %>%
  mutate(HDI_level=case_when(avg_HDI_2008to2018>=0.8~"very high",
                             (avg_HDI_2008to2018<0.8&avg_HDI_2008to2018>=0.7)~"high",
                             (avg_HDI_2008to2018<0.7&avg_HDI_2008to2018>=0.55)~"medium",
                             avg_HDI_2008to2018<0.55~"low"
                             )
         )

joined_data_HDI$HDI_level <- factor(joined_data_HDI$HDI_level, levels = c("very high","high","medium","low"))

joined_data_HDI %>%
  ggplot(aes(x=avg_2008to2018_yearly,y=avg_HDI_2008to2018, color=HDI_level)) +
  geom_point()+
  labs(x="2008-2018 yearly number of visitors average", y="2008-2018 average HDI",
       title="Average HDI") +
  scale_color_discrete(name="HDI levels of countries")

HDI_level_grouped <- joined_data_HDI %>%
  group_by(HDI_level) %>%
  summarise(sum=sum(avg_2008to2018_yearly))

ggplot(HDI_level_grouped, aes(x="",y=sum,fill=HDI_level))+geom_bar(stat="identity",width=1)+coord_polar("y")+labs(x="",y="Total yearly average visitors", title="HDI Level")+
  scale_fill_discrete(name="HDI levels of countries")

We realized that this may not be an ideal categorization, since there are too many countries in “high level” category and too few in “low”. Therefore we also wanted to analyze levels by quantile, like we did with other metrics.

q3 <- quantile(joined_data_HDI$avg_HDI_2008to2018, na.rm = TRUE)

joined_data_HDI <- joined_data_HDI %>%
  mutate(HDI_level_by_quantile=case_when((avg_HDI_2008to2018 >=q3[1]&avg_HDI_2008to2018<q3[2])~"very low",
                                         (avg_HDI_2008to2018>=q3[2]&avg_HDI_2008to2018<q3[3])~"low",
                                         (avg_HDI_2008to2018>=q3[3]&avg_HDI_2008to2018<q3[4])~"medium",
                                         (avg_HDI_2008to2018>=q3[4]&avg_HDI_2008to2018<=q3[5])~"high" 
  )
  )

joined_data_HDI$HDI_level_by_quantile <- factor(joined_data_HDI$HDI_level_by_quantile, levels = c("high","medium","low","very low"))

HDI_level_by_quantile_grouped <- joined_data_HDI %>%
  group_by(HDI_level_by_quantile) %>%
  summarise(sum=sum(avg_2008to2018_yearly))

joined_data_HDI %>%
  ggplot(aes(x=avg_2008to2018_yearly,y=avg_HDI_2008to2018, color=HDI_level_by_quantile)) +
  geom_point()+
  labs(x="2008-2018 yearly number of visitors average", y="2008-2018 average HDI",
       title="Average HDI") +
  scale_color_discrete(name="HDI levels of countries (by quantile)")

ggplot(HDI_level_by_quantile_grouped, aes(x="",y=sum,fill=HDI_level_by_quantile))+geom_bar(stat="identity",width=1)+coord_polar("y")+labs(x="",y="Total yearly average visitors", title="HDI Level by Quantile")+
  scale_fill_discrete(name="HDI levels of countries (by quantile)")

HDI is a more comprehensive metric than others, so we may rely on this analysis relatively more. Again, we do not see any ‘medium’ level countries among top visitors. Also there is not any ‘very low’ level countries among top visitors. To summarize this part of the analysis, we can state that Turkey attracts countries mostly from 1st and 3rd quantile of development levels.

4.5 Visa Policy of Turkey

visa<-read_excel("vize_muafiyeti.xlsx")

pivot_longer() function is used to transpose the data.

data_transpose<-data%>%pivot_longer(cols=-Date,names_to="Free Visa")

data_year contains the total number of visits with respect to years.

data_year<-data_transpose%>%group_by(year(Date), `Free Visa`)%>%summarise(total=sum(value))

data_countries vector states the names of the regions and countries.

data_countries<-data%>%select(-Date)%>%names()

In order to find the intersection between the countries which don’t need visa for entering Turkey and and the original data frame, inner_join() function is used. Therefore, a new column that shows the need for visa is created. “1” is assigned to these countries, which means that they are allowed to enter the country without visa.

free_visa_df<- data_transpose%>%inner_join(visa,by="Free Visa")
free_visa_df[,4]<-1
colnames(free_visa_df)[4]  <- "Visa"
colnames(free_visa_df)[2]  <- "Country"

The other countries, which require to have visa for visits in Turkey, are assigned to “0”. Some column names are arranged for clarification.

no_visa_df<-data_transpose%>%anti_join(visa,by="Free Visa")
no_visa_df[,4]<-0
colnames(no_visa_df)[4]  <- "Visa"
colnames(no_visa_df)[2]  <- "Country"

Then, two data frame free_visa_df and no_visa_df is merged and total_data is obtained. Now, the need for visa is clearly shown in single data frame.

#1 means free visa
#0 means that visa is needed
total_data<- free_visa_df%>%full_join(no_visa_df,by=c("Visa","Date","Country","value"))

sum_data represents the number of visits according to countries and years as well as the visa information. sum_byvisa data frame examines the effect of visa requirent closely. It summarizes all the countries and years in terms of visa requirement by calculating the mean and sum of visit statistics. Thus, even though many countries can enter to Turkey without visa, the others visit much more than the visa-free ones. So, the need for visa doesn’t have a significant effect on the number of tourists to Turkey.

sum_data<-total_data%>%group_by(Country,Visa, year(Date))%>%summarise(total=sum(value))
sum_byvisa<-total_data%>%group_by(Visa)%>%summarise(mean=mean(value),total=sum(value))
sum_byvisa$mean
## [1] 134580.7  34193.4

The continent and region names are removed from the data because they don’t contain any visa information. And, the visit data is ranked in descending order.

ranked_final<-sum_data%>%arrange(desc(total))  
ranked_final<-ranked_final[!grepl("Total|Africa|Other", ranked_final$Country),]
head(ranked_final,10)
## # A tibble: 10 x 4
## # Groups:   Country, Visa [2]
##    Country  Visa `year(Date)`   total
##    <chr>   <dbl>        <dbl>   <dbl>
##  1 Russia      1         2019 7000407
##  2 Russia      1         2018 5937850
##  3 Germany     1         2015 5593065
##  4 Germany     1         2014 5251870
##  5 Germany     1         2013 5048199
##  6 Germany     1         2012 5025660
##  7 Germany     1         2019 5024193
##  8 Germany     1         2011 4815156
##  9 Russia      1         2017 4699557
## 10 Germany     1         2018 4496173
tail(ranked_final,10)
## # A tibble: 10 x 4
## # Groups:   Country, Visa [4]
##    Country     Visa `year(Date)` total
##    <chr>      <dbl>        <dbl> <dbl>
##  1 Oceania        0         2011   474
##  2 Oceania        0         2013   446
##  3 Oceania        0         2017   438
##  4 Oceania        0         2015   360
##  5 Oceania        0         2020   335
##  6 Oceania        0         2008   325
##  7 Iceland        1         2020   318
##  8 Oceania        0         2009   209
##  9 Kosovo         1         2008     0
## 10 Montenegro     1         2008     0

In the final plot, the points show the number of total visits to Turkey, country by country. Red points state that this country on the x-axis is not allowed to enter Turkey without visa. Blue dots represent that free-visa countries, which are free to enter the country without visa.

ggplot(ranked_final,aes(x=Country,y=total,color=factor(Visa)))+
  geom_point()+
  scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))+
  theme(axis.text.x = element_text(angle = 90, vjust = .5)) +
  labs(y="Total Number of Tourists (yearly)", title="Number of Tourists vs Countries with Turkey's Visa Policy", color="Visa Policy")

4.6 Analysis of Exchange Rates and Visitors

In this part, an analysis of change of exchange rates ,and whether it has salient effects on the number of tourists is conducted. For that purpose, “Euro/Turkish Lira and Dollar/Turkish Lira exchange rates” and “total visitors from Europe and USA” between 2008 and 2020 are used as variables. .

Related data is gathered from the sources and data preparation is conducted through cleaning, filtering and reframing the data. Below, you can see the coding part by clicking the code box on the right side.

mil <- read_xlsx("milliyetlere_gore_ziyaretci_sayisi.xlsx")
mil <- mil %>%
  mutate(Date=as.yearmon(Date)) %>% mutate(Date=as.Date(Date))
euro_dolar<- read_csv("euro-dollar.csv",guess_max = 100,
                      col_types = cols(
                        "Tarih" = col_date(format="%Y-%m"),
                        "TP DK EUR A YTL" = col_number(),
                        "TP DK USD A YTL" = col_number()
                      ))
#NA values stems from the source information etc
#rename
names(euro_dolar)=c("Date", "euro_tl_exchange_rate","dollar_tl_exchange_rate") 
#getting related columns
euro_dolar_tidy<-euro_dolar %>%
  as.data.frame %>%
  slice(1:150) 
#preparing data for visualization
euro_dolar_2<-euro_dolar_tidy%>%
  pivot_longer(!Date,names_to="exchange_type",values_to="rate")

4.6.1 Exchange Rate Analysis

The change in the exchange rates of both EURO and DOLLAR is examined from the first month of 2008 untill sixth month of 2020. For that, first data preproccesing is done, and secondly, data is analyzed and visualized for a better understanding with the help of dplyr and ggplot2 packages.

#ploting line graph to see exchange rate by time
ggplot<-ggplot(euro_dolar_2,aes(Date,rate,color=exchange_type))+ 
  geom_line(size=1)+
  labs(x="Years", y="Exchange Rate", title="Exchange Rate by Years(2008-2020)", color="Exchange Rate Type")+
  theme_minimal()
  
  
ggplotly(ggplot)

From the graph, it can be observed that, within the given time frame, euro and dollar tend to increase correlatedly, euro being slightly higher than dollar. Thus, Turkish Lira has been in a trend of losing value for years.

4.6.2 Number of Visitor and Exchange Rate Comparison

In this part, total number of visitors from Europe and USA, and the change of exchange rate in Euro and Dollar is examined to better understand if a correlation and/or trend exist between the two. For that purpose, first, the euro_dolar_tidy dataset and related columns(i.e “Total Europe” and “United States”) of mil dataset are joined together by years:

usa_and_total_europe<-mil[1:150,c("Date","United States","Total Europe")]
#joining two data_sets(Here, we observe that it is perfectly matched.(i.e no NA field))
joined_data<-usa_and_total_europe %>%
  left_join(euro_dolar_tidy,by=c("Date"))

Here, you can see how the combined data looks like:

## # A tibble: 6 x 5
##   Date       `United States` `Total Europe` euro_tl_exchange_~ dollar_tl_exchan~
##   <date>               <dbl>          <dbl>              <dbl>             <dbl>
## 1 2008-01-01           22581         551693               1.72              1.17
## 2 2008-02-01           18562         470599               1.75              1.19
## 3 2008-03-01           29479         732094               1.91              1.23
## 4 2008-04-01           38922         861799               2.04              1.3 
## 5 2008-05-01           80430        1542177               1.94              1.25
## 6 2008-06-01           79917        1611362               1.91              1.23

Total visitors from Europe and Euro/TL exchange rate can be analyzed and visualized with the help of ggplot2 and plotly:

By examining both of the graphs, it can be observed that total number of tourists from Europe to Turkey have an oscillating shape, which means the change in the number of visitors is seasonal( most visitors is observed during summer) rather than being correlated with the exchange rate of Euro/TL. Although total number of tourists from Europe has an increasing trend after a significant decrease in 2016, there is still not enough evidence to infer that the increase in tourists is due to the increase in Euro/tl currency. The sharp decrease in 2016 might be, mostly, due to the political geopolitical risks related to the political climate of Turkey at the times. Also, the sharp decrease after March-2020 stems from the COVID-19 epidemic and its global effects.

Similar seasonality can be observed in the number of USA visitors as in Europe. However, the month where peak value in the number of visitors is seen slightly differs from European Tourists. For instance, the peak is observed in October during the years 2008-2009 and 2011, whereas in 2012-2013 and 2019 peak value is seen in July during summer season. Similar to Europe, total visitors has decreased during 2016 probably due to same reasons as mentioned above. Also, the sharp decrease after March-2020, again, stems from the COVID-19 epidemic and its global effects.

4.7 The Effect of Distance Between Turkey and the Other Countries

Average number of tourists per year for each country should be calculated firstly to make an analysis for each year. Data is created again and again for the years from 2008 to 2020 to receive these mean values of tourists from each country via for loop. Some rows are deleted to eliminate ambiguous non-country data such as “Other Europe Countries”.

Distance (between Turkey and specified country) vector is created for each respectively to determine correlation between distance and number of tourists per year. In this way, a correlation vector including correlation values of 13 years is received.

cor_data<-c()
for (i in 2008:2020){
data<-raw_data%>%
  select(-c('Haymatlos', 'Grand Total'))%>%
 filter(format(Date, "%Y")==i)%>%
 pivot_longer(.,cols = -Date,
             names_to = "names", 
          values_to = "values")%>%
  group_by(names)%>%
  summarise(avg=mean(values))
data1<-data[-c(2,4,7,8,15,24,25,27,28,36,37), ] 
distance<-c(10066,2350,13004,1295,12551,1977,1080,2500,2045,5389,1646,2709,1538,10708,915,1979,3189,6077,2637,
            9930,867,2314,3760,8849,2843,2793,7770,792,4566,1744,2682,3301,1048,1802,3596,3322,834,2896,2353,1905,
            4429,8686,8630,1386,2136,2818,3376,410,11286,1249,1575,2130,2063,2000,579,2563,1506,1038,8756,1843,
            11801,1413,1054,3068,12427,2451,3337,1852,3674,1129,1873,13348,8105,1363,1638,1811,2604,518,1931,
            3091,6872,2381,2115,1137,894,10388,2884,16624,1122)
cor_data <- c(cor_data, cor(data1$avg,distance))
}

We can plot the correlation data to decide if there is a relation between number of tourists and distance.

years<-c(2008:2020)
yearly_correlation<-data.frame(Years=years,Correlation=cor_data)
ggplot(yearly_correlation,aes(x=Years, y=Correlation))+geom_col()+ylim(-1,1)+theme_light()+
  scale_x_continuous(breaks = seq(2008, 2020, by = 1))+theme(axis.text.x=element_text(angle = 90))

All the values lay below the zero line, which means there is a negative relationship. So, we can say if distance value increases, number of tourists will decrease. However, it is clear that distance has no a big impact on number of tourists because absoulute value of correlation values is almost zero.

In this part, air distance is used to demonstrate if distance influences the number of tourists. The reason why air distance does not affect much might be that transportation opportunities play bigger role today than air distance. That is, number of tourists from farther countries might be high if they have better flight opportunity with more flights, cheaper tickets, less transfers etc.

When we consider the countries from which tourists come the most form the plot “Countries from which Tourists Come the Most” in section 4.1, it is clear to see six of them are Turkey’s neighbouring countries which are Bulgaria, France, Georgia, Greece, Iran, Netherlands, Russia, Ukraine. Even if distance seems not to affect number of tourists so much in general, the situation is different for neighbouring countries. That is, tourists from neighbouring countries may prefer Turkey more because of closeness.

raw_data%>%
  select(c('Bulgaria', 'France','Georgia','Germany','Greece','Iran','Netherlands','Russia','Ukraine',
           'United Kingdom',Date))%>%
  pivot_longer(.,cols = -Date,
             names_to = "Country", 
             values_to = "Number of Tourists")%>%
  ggplot(.,aes(x=Date,y=`Number of Tourists`,color=Country))+geom_line()+labs(title="Countries from which Tourists Come the Most",subtitle = "Line Plot Version")

As we see in the plot, there is a decrease for all the countries in 2016, which arises most probably from Turkey’s deteriorating image due to the coup attempt. However, there is also a special case for number of Russian tourists which starts to decrease in 2015 not 2016 as in others. The reason is political tension between Russia and Turkey after Russian plane crash made by Turkey. Then, the number increases again with the normalization process at the end of 2016.

4.8 Forecast Plot for Germany

By using auto.arima function and data of previous 150 months, the number of visitors from Germany are forecasted and plotted.

germany<-mil[1:150,c(1,2)]
germany_ts<-ts(germany)
fit<-auto.arima(germany_ts[,2])
summary(fit)
## Series: germany_ts[, 2] 
## ARIMA(3,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3      ma1       mean
##       1.2210  -0.2028  -0.3695  -0.6190  382814.82
## s.e.  0.0993   0.1443   0.0843   0.0853   11616.27
## 
## sigma^2 estimated as 1.709e+10:  log likelihood=-1978.49
## AIC=3968.99   AICc=3969.57   BIC=3987.05
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE     MASE       ACF1
## Training set -792.6027 128521.3 101015.6 -146.0323 171.1129 0.825981 0.01975808
fit%>%forecast()%>%autoplot()+
  labs(x="Date",y="Number of Visitors from Germany",title="Forecast of Number of Visitors from Germany")

5 Conclusion

Turkey is a country having high potential to attract tourists especially during summer season thanks to its nature, cultural and historical heritage, long coastline with beautiful beaches. Preference of tourists can be Turkey because of the touristic potential or not due to some other reasons.

In this analysis, number of tourists is examined in detail with different aspects such as:
Terrorism in Turkey
With respect to the annual analysis of this correlation, an increase in the number of terrorism events results in a decrease in the number of touristic visits. Although the terrorism data frame only gives information until 2017, data visualization makes it clear to observe the negative correlation between these two subjects.
Development of the countries from which tourists come
Turkey attracts countries mostly from 1st and 3rd quantile of development levels measured by GDP per capita and human development index
Visa policy of Turkey
Even though many countries can enter to Turkey without visa, the others visit much more than the visa-free ones. So, the need for visa doesn’t have a significant effect on the number of tourists to Turkey.
Exchange rates
Although total number of tourists from Europe has an increasing trend after a significant decrease in 2016, there is still not enough evidence to infer that the increase in tourists is due to the increase in Euro/tl currency. Similar analysis can be made in the number of USA visitors as in Europe.
Distance between Turkey and the other countries
When calculating correlation between distance and number of tourists, we see there is not a big impact of air distance on number of tourists because of low correlation results. Nevertheless, they have a negative relationship since all the values are negative. The reason why air distance does not affect much might be that transportation opportunities play bigger role today than air distance.